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(54) Indexing of recordings 

(57) A recording is indexed by keywords. In order to 
perform the indexing, an audio portion (12) of the re- 
cording is transcribed (31) to produce text in a text file. 
A time stamp (32) is associated with each word in the 
text. Each time stamp (32) indicates a time in the record- 
ing at which occurs an associated word. Once a record- 
ing has been indexed, the recording may be searched 
along with other recordings. For example, in response 
to a user choosing a keyword (46), a text file for each 
recording is searched for occurrences of the keyword 
(46). At the conclusion of the search, each recording 
which includes an occurrence of the keyword is listed 
(42). When a user selects (42) a first recording and a 
particular occurrence of the keyword (46), the first re- 
cording is played starling slightly before a time corre- 
sponding to a first time stamp associated with the par- 
ticular occurrence of the keyword in the first recording. 
In response to control sequences, prior and next occur- 
rences of the keyword (42) can be observed in one or 
multiple recordings. 
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Description 

The present invention relates to indexing of a re- 
cording for use. for exannple, in perlorming content 
searches within digital audio and audio-video record- 
ings, using keywords to index digital audio and audio- 
video recordings. 

Improvements in storage and compression technol- 
ogies have allowed a revolution in multimedia. Audio re- 
cordings are now often stored in digital format. In addi- 
tion, it is now feasible to convert full length movies into 
digital audio-video (video) recordings for replay. Using 
digital video, a user may, with off-the-shelf software 
products, access and edit full-screen, full-motion video 
recordings. 

In order to make the best use of a computer's ability 
to manipulate digital audio and audio-video recordings, 
it is desirable to have some way to perform content 
searches. Currently, the ability to perform content 
searching is significantly limited or non-existent. 

There exists some limited ability in the art to perform 
content searches of images. See for example, the QBIC 
Project by IBM Corporation, having a business address 
of 650 Harry Road, San Jose. California 951 20. Howev- 
er, such searching of images on content is limited to vis- 
ual content and is not capable of performing content 
searches on digital audio recordings. 

The present invention seeks to provide indexing of 
recordings for performing content searches. 

According to an aspect of the present invention 
there is provided a method of indexing a recording as 
specified In claim 1 . 

The method may index all text words. 

Preferably, in response to a user choosing a key- 
word, the method searches the set of words for all oc- 
currences of the keyword. Advantageously, this step in- 
cludes listing all occurrences of the found keyword and, 
In response to selection of an occurrence of the listed 
keyword, playing the recording, preferably starting 
slightly before a time corresponding to a time stamp as- 
sociated with the selected occurrence of the keyword. 

The method is applicable to various types of record- 
ing, including audio and audio-video recordings. 

According to another aspect of the present inven- 
tion there is provided a method of accessing selections 
within a plurality of recordings as specified in claim 7. 

According to another aspect of the present inven- 
tion there is provided a system for accessing selections 
within a plurality of recordings as specified in claim 10. 

!t will also be apparent that the invention extends to 
a system for indexing a recording. 

In the preferred embodiment, a recording is indexed 
by keywords. In order to perform the indexing, an audio 
portion of the recording is transcribed to produce text in 
a text file. The transcription may be performed, for ex- 
ample, manually by a transcriber or using speech rec- 
ognition technology. After transcription, a time stamp is 
associated with each word in the text. Each time stamp 
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indicates a time in the recording at which occurs an as- 
sociated word. The time stamps may be added to the 
text file, for example using speech recognition technol- 
ogy. Alternately, the time stamps may be added to the 
5 text file by an operator using a computing system. For 
example, the text is displayed in a first window of a com- 
puter display. The recording is displayed in a second 
window of the computer display. Upon the operator se- 
lecting (or upon automatic selection of) a selected word 
10 of the text in the first window, a time stamp is added to 
the text file which indicates an elapsed time from a be- 
ginning of the recording until selection by the operator 
of the selected word. Once the operator has in this way 
or by some other method assigned time stamps to a sub- 
's set of words in the text, interpolation may be used to 
assign time stamps to the remaining words in the text 
which are not within the subset of words assigned time 
stamps by the operator. Once time stamps have been 
assigned to each word in the text, the words and asso- 
^0 ciated time stamps may be arranged in a balanced tree 
for efficient access by a search program. Other search 
techniques can be used instead of the balance tree. For 
example, a binary tree can be used. 

The preferred embodiment also provides for key- 
25 word searching of a plurality of recordings, each with an 
associated text file created as described above. In re- 
sponse to a user choosing a keyword, a text file/bal- 
anced tree for each recording is searched for occurrenc- 
es of the keyword. At the conclusion of the search, each 
30 recording which includes an occurrence of the keyword 
is listed. When a user selects a first recording and a par- 
ticular occurrence of the keyword, the first recording is 
played starting slightly before a time corresponding to a 
first time stamp associated with the particular occur- 
-35 fence of the keyword in the first recording. 

For example, after searching on a keyword, the re- 
cordings may be listed as follows. The list of recordings 
which include an occurrence of the keyword are dis- 
played in a first window of a computer display. One of 
40 the recordings from the list of recordings displayed in 
the first window is highlighted. A user may select which 
recording is highlighted. In one embodiment, upon a us- 
er selecting a particular recording, a first-in-time occur- 
rence of the keyword within the particular recording is 
45 played. Keystroke commands may be used to jump to 
other occurrences. In an alternate embodiment, when a 
recording displayed in the first window is highlighted, 
each of the occurrences of the keyword within the high- 
lighted recording is listed. This may be done, for exam- 
50 pie, in a second window of the computer display. 

Various variations may be made to the preferred 
embodiments. For example, in addition to allowing 
searching on a single keyword, searching may be per- 
formed on multiple keywords connected by Boolean log- 
55 ic or may be performed on concepts. It is also envisaged 
that the process may be entirely automated in some ap- 
plications. 

The present invention can allow for efficient content 
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searching of recordings, improving over other currently 
available schemes to index recordings. 

An embodiment of the present invention is de- 
scribed below, by way of example only, with reference 
to the accompanying drawings, in which: 

Figure 1 illustrates steps taken to allow keyword in- 
dexing of digital recordings in accordance with the pre- 
ferred embodiment. 

Figure 2 is a flowchart which shows steps by which 
text for a digital recording is keyword indexed in accord- 
ance with the preferred embodiment. 

Figure 3 and Figure 4 show computing displays 
which illustrate the preparation of a data base used for 
keyword indexing of digital recordings in accordance 
with the preferred embodiment. 

Figure 5 shows a computing display used for key- 
word index searches of a video library in accordance 
with the preferred embodiment. 

Figure 6 shows a computing display used for key- 
word index searches of a video library in accordance 
with an alternate embodiment. 

Figure 1 illustrates steps taken to allow keyword in- 
dexing of digital recordings. A recording source 11 is dig- 
itized and compressed to produce digitized recording 
file 13. Recording source 11 is. for example, an audio 
recording or an audio-video recording. When recording 
source 11 is an audio-video recording, data in digitized 
recording file 13 is, for example, stored in MPEG-1 for- 
mat. Digitized recording file 13 may be produced from 
analog recording source 11 using, for example, OptiV- 
ideoMPEG 1 Encoder available from OptiVision, having 
a business address of 3450 Hillview Ave. , Palo Alto, CA 
94304. 

In addition, the audio portion of recording source 1 1 
is transcribed to produce a text file 12 which includes 
the text. The transcription may be performed manually. 
Alternately, the audio portion of recording source 1 1 may 
be transcribed directly from recording source 11 or dig- 
itized recording file 13 using computerized speech rec- 
ognition technology such as DragonDictate for Windows 
available from Dragon Systems, Inc., having a business 
address of 320 Nevada Street, Newton, MA 021 60. Text 
file 12 and digitized recording file 13 are then made 
available to a computer system 1 4. 

Figure 2 is a flowchart which shows steps by which 
text for a digitized recording file 1 3 is keyword indexed. 
In a step 31 , text is produced which is the audio portion 
of digitized recording file 13. This text is a result of the 
transcription described above. 

Figure 3 illustrates the result of the transcription 
process. Figure 3 shows a window 23 in a computer 
screen 21. Within window 23 is the transcribed text of 
the audio portion of recording file 1 3. 

In a step 32. shown in Figure 2, time stamps asso- 
ciated with words in the text are added to the transcribed 
text. In the preferred embodiment, the time stamps are 
in milliseconds and indicate elapse of time relative to the 
starting point of the digital recording within recording file 



4 
13. 

Placement of time stamps may be performed, for 
example, with the help of an operator utilizing, on com- 
puter 14 (shown in Figure 1). software specifically de- 

5 signed to add time stamps. For example, the recording 
is played by computer 14. For an audio-video recording, 
a window 22 in computer screen 21 . as shown in Figure 
3. may be added in which the audio-video recording is 
played. The operator of computer 14, using cursor 24. 

10 selects words as they are spoken in the recording 
played by computer 14. Whenever the operator selects 
with cursor 24 a word from the text in window 23, the 
software running on computer 14 time stamps the word 
with the current time duration which represents the 

15 elapse of time relative to the starting point of the digital 
recording. 

Figure 4 further illustrates this process. In Figure 4, 
time stamps TS1 , TS2 and TS3 have been added to text 
23 by an operator as described above. Source code for 

20 software which implements the time stamp feature dis- 
cussed above for audio-video recordings will be appar- 
ent to the skilled person. Alternately, step 32, shown in 
Figure 2. may be automated so that speech recognition 
technology is used to trigger the placement of time 

2S stamps within text 23. 

After the time stamps have been added to text 23, 
in a step 33 shown in Figure 2, every word of text 23 is 
assigned a time stamp. For words which were not as- 
signed a time stamp in step 32= interpolation is used to 

30 determine an appropriate time stamp. 

For example, Table 1 below shows a portion of text 
23 after the completion of step 32. 

Table 1 

35 

Once::11 upon a time; :20 there was a boy:: 28 
named Fred. He went::35 to the forest::44. ... 
In the example given in Table 1 , the word "Once" was 
spoken at 1 1 milliseconds from the beginning of the au- 

40 dio track of the digital recording. The word "time" was 
spoken at 20 milliseconds from the beginning of the au- 
dio track of the digital recording. The word "boy" was 
spoken at 28 milliseconds from the beginning of the au- 
dio track of the digital recording. The word "went" was 

45 spoken at 35 milliseconds from the beginning of the au- 
dio track of the digital recording. The word "forest" was 
spoken at 44 milliseconds from the beginning of the au- 
dio track of the digital recording. 

In order to assign time stamps to the remainder of 

50 the words, interpolation is used. For example, nine mil- 
liseconds elapsed between the word "Once" and the 
word "time". There are two words, "upon", and "a"? which 
occur between "Once" and "time". As a result of the in- 
terpolation, the words "upon", and "a" are assigned time 

55 stamps of 1 4 milliseconds and 1 7 milliseconds, respec- 
tively. This is done so that there is allocated three milli- 
seconds between the occurrence of the word "Once" 
arid the word "upon": there is allocated three millisec- 
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onds between the occurrence of the word "upon" and 
the word "a": and there is allocated three milliseconds 
between the occurrence of the word "a" and the word 
"time". 

The words and their time stamps are placed in an 
output file. For example, the output file may have on 
each line a single word, separated by a tab character 
from a time stamp for the word. Table 2 below shows 
the form of the file for the example text file shown in Ta- 
ble 1 above; 



Table 2 



Once 


11 


upon 


14 


a 


17 


time 


20 


there 


22 


was 


24 


a 


26 


boy 


28 


named 


30 


Fred 


32 


He 


34 


went 


35 


to 


38 


the 


41 


forest 


44 



Source code for software which implements the interpo- 
lation feature discussed above will be apparent to the 
skilled person. Alternately, in step 32 every word may 
be assigned a time stamp, for example using speech 
recognition technology, so that no interpolation is nec- 
essary. Using speech recognition technology, words 
may be transcribed and time stamped simultaneously. 
Alternately, speech recognition technology maybe used 
in a separate pass in which time stamps are added to a 
transcription of the text. When used in a separate pass 
to add time stamps to words in a text, the speech rec- 
ognition software adds time stamps for unrecognized 
words by interpolation, as described above. In a step 34 
shown in Figure 2, a balanced tree is built which allows 
fast access of words within the output file. The balanced 
tree is built, for example, using an algorithm known in 
the art. See for example, Robert Sedgewick. "Algo- 
rithms in C+-t-", Addison-Wesley Publishing Company, 
1992. pp. 215-229. Appendix C includes source code 
for software which implements the construction of the 
balanced tree as set out in step 34. Alternately a binary 
tree or other searching algorithm may be used. In other 
embodiments, searching may be performed directly on 
the output file constructed in step 33. 



The balanced tree constructed in step 34 serves as 
a keyword index of the digital recording. The balanced 
tree is accessed to locate where a word is spoken in the 
movie. 

5 For example. Figure 5 illustrates an interface on a 

computer screen 51 which utilizes the keyword index 
constructed as described above. In a box 56, a user 
types one or more keywords connected by. Boolean var- 
iables. In a window 52. recordings are listed in which the 

10 keyword(s) appear. The number of "hits" of a keyword 
appears is listed next to the recording. In the preferred 
embodiment, the recordings are listed in descending or- 
der by the number of keyword occurrences. A user se- 
lects a recording using cursor 54, cursor keys, or some 

'5 other way. When a recording is selected, for example 
using an "OK" button 58 by the user, the portion of the 
selected recording (listed in window 52) in which the first 
occurrence of the selected keyword appears is played. 
For an audio-video recording, the visual portion is dis- 

20 played in display window 55. The portion of the record- 
ing is displayed for a configurable duration (e.g., two 
seconds) starting, for example, one second before the 
occurrence of the keyword. tJsing keyboard commands, 
a user can continue viewing the recording, fast forward. 

25 reverse, skip to the next occurrence of the keyword, go 
back to the last occurrence of the keyword, continue 
playing and so on. The interface also includes a "cancel" 
button 59. Source code for software which (in addition 
to implementing the construction of the balanced tree) 

30 implements the keyword searching of recordings, as 
discussed above will be apparent to the skilled person. 

The interface in Figure 5. may be enhanced to in- 
clude additional features. For example, Figure 6 illus- 
trates an interface on a computer screen 41 which uti- 

35 lizes the keyword index constructed as described 
above. In a box 46, a user types one or more keywords 
connected by Boolean variables. In a window 42, re- 
cordings are listed in which the keyword(s) appear. The 
number of times a keyword appears in a recording is 

40 listed next to the recording. In the preferred embodi- 
ment, the recordings are listed in descending order by 
the number of keyword occurrences. A user selects a 
recording using cursor 44. cursor keys, or some other 
way. When a recording is selected, in a window 43. a 

4S list of all the occurrences for the keyword(s) are listed. 
In one embodiment, a fragment of text, along with the 
time stamp, are displayed for each occurrence. Alter- 
nately, only the keyword and time stamp, or only the time 
stamp are displayed for each occurrence. 

50 Upon selection of an "OK" button 48 by the user, 
the portion of the selected recording (in window 44) in 
which the selected keyword (in window 43) appears is 
played in a digitized recording display window 45. The 
portion of the recording is displayed for a configurable 

55 duration (e.g., two seconds) starting, for example, one 
second before the occurrence of the keyword. Using a 
control panel 50. a user can continue viewing the re- 
cording, fast forward, reverse, skip to the next occur- 
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rence of the keyword, go back to the last occurrence of 
the keyword, continue playing and so on. The interface 
also includes a "cancel" button 49. 

In addition to searching on one or more keywords 
connected by Boolean variables, the balanced tree 
formed in step 34 (shown in Figure 2) may also be 
searched using concept based searching techniques, 
for example using Metamorph available from Thunder- 
stone Software-EPI, Inc. having a business address of 
11115 Edgewater Drive. Cleveland. Ohio 44102. 

The foregoing discussion discloses and describes 
merely exemplary methods and embodiments. 

The disclosures in United States patent application 
no. 08/576,106, from which this application claims pri- 
ority, and in the abstract accompanying this application 
are incorporated herein by reference. The US parent ap- 
plication also includes examples of the source codes 
mentioned herein. 



Claims 

1. A method of indexing a recording comprising the 
steps of; 

(a) transcribing (31 ) an audio portion of the re- 
cording to produce text in a text file; and, 

(b) providing (32) for each of a set of words in 
the text, a time stamp which indicates a time in 
the recording at which each word in the set of 
words occurs. 

2. A method as in claim 1 wherein step (a) is accom- 
plished manually by a transcriber or with the use of 
speech recognition technology. 

3. A method as in claim 2 wherein when step (a) is 
accomplished with the use of speech recognition 
technology, steps (a) and (b) are performed simul- 
taneously. 

4. A method as in claim 1 , 2 or 3. wherein step (b) in- 
cludes the substeps of: 

(b.1) providing for each of a subset of the set 
of words in the text, a time stamp which indi- 
cates a time in the recording at which each word 
in the subset of the set of words occurs; and, 
(b.2) for a remainder of the set of words which 
are not in the subset of the set of words, using 
interpolation to provide a time stamp which in- 
dicates a time in the recording at which each 
word in the remainder of the set of words oc- 
curs. 

5. A method as in claim 4 wherein the recording Is an 
audio-video recording and wherein substep (b. 1 ) in- 
cludes the substeps of: 



(b. 1 . 1 ) displaying the text in a first window (23) 
of a computer display: 

(b.1. 2) playing a video portion of the recording 
in a second window (22) of the computer dis- 

5 play: and, 

(b.1.3) upon an operator selecting a selected 
word of the text in the first window, adding a 
time stamp (TS1...) to the text file which indi- 
cates an elapsed time from a beginning of the 

^0 recording until selection by the operator of the 

selected word. 

6. A method as in any preceding claim, comprising the 
step of: 

'5 (c) arranging the set of words and associated 

time stamps into a balanced tree based on occur- 
rences of each word in the set of words. 

7. A method of accessing selections within a plurality 
20 of recordings, comprising the steps of: 

(a) in response to a user choosing a keyword, 
searching a plurality of text files for occurrences 
of the keyword, wherein text files are associat- 
es ed with recordings so that for each of the plu- 
rality of recordings, one text file from the plural- 
ity of text files includes a text of an audio portion 
of the recording, each word in each text file be- 
ing associated with a time stamp (TS1 ...) which 

30 indicates an approximate location in an associ- 

ated recording of an occurrence of the word; 

(b) listing (44) recordings which include an oc- 
currence of the keyword; and, 

(c) upon a user selecting a first recording and 
25 a particular occurrence of the keyword, playing 

the first recording starting slightly before a time 
corresponding to a first time stamp associated 
with the particular occurrence of the keyword in 
the first recording. 

40 

8. A method as in claim 7 wherein in step (c) upon a 
user selecting the first recording, a first-in-time oc- 
currence of the keyword within the first recording is 
automatically selected as the particular occurrence 

45 of the keyword. 

9. A method as in claim 7 or 8 wherein step (b) in- 
cludes the substeps of: 

50 (b.1) listing in a first window the recordings 

which include an occurrence of the keyword; 
{b.2) highlighting one of the recordings from the 
recordings listed in the first window: and, 
(b.3) listing each of the occurrences of the key- 

55 word within the recording highlighted in substep 

(b.2). 

10. A system for accessing selections within a plurality 
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of recordings, comprising: 

a plurality of text files (24), each text file includ- 
ing a text of an audio portion of an associated 
recording from the plurality of recordings: s 
search means for searching the text files for oc- 
currences of the keyword in response to selec- 
tion of a keyword; and 

recording play means for playing the first re- 
cording starting slightly before a time corre- ?o 
sponding to the particular occurrence of the 
keyword in the first recording in response to se- 
lection of a particular occurrence of the key- 
word within a first recording. 

75 

1. A system as in claim 10 wherein the search means 
includes a first keyword display (46) able to accept 
from a user a specification of a particular keyword; 
and a first window (42) operable to display a list of 
recordings which include an occurrence of the par- 20 
ticular keyword. 

2. A system as in claim 11 wherein the search means 
includes a second display (42) operable to display 
occurrences of the keyword within a recording high- 25 
lighted in the first window. 

3. A system as in claim 10, 11 or 12, wherein the 
search means is operable to search on a plurality 

of keywords connected by Boolean logic and/or to 30 
perform concept based searches on the keyword. 
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